Remove exposure on column construction and unwrap buffers on pylibcudf conversion#20980
Merged
rapids-bot[bot] merged 10 commits intorapidsai:mainfrom Jan 8, 2026
Merged
Conversation
Remove the `copy` parameter from DataFrame.to_pylibcudf(), Series.to_pylibcudf(), and Index.to_pylibcudf() since it was never implemented and always raised NotImplementedError. Updated docstrings to clarify that these methods always perform zero-copy operations and return views of the existing data. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add a deep copy method for pylibcudf.Table that creates a new table with deep copies of all columns. The implementation follows the same pattern as Column.copy(), accepting optional stream and memory resource parameters. Changes: - Added Table.copy() method in table.pyx - Added method declaration in table.pxd - Added type stub in table.pyi - Added test_table_copy() to verify deep copy behavior The copy method iterates over all columns and calls copy() on each, ensuring complete independence between the original and copied tables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
With the buffer unwrapping changes in to_pylibcudf(), buffers are no longer truly "exposed" during construction. The pylibcudf source objects maintain references to the original buffer owners, ensuring data stays alive even if cudf's wrapper is deleted during a spill. Changes: - Remove exposed parameter from BufferOwner.__init__() and from_device_memory() - Remove exposed parameter from SpillableBufferOwner.from_device_memory() - Remove exposed parameter from as_buffer() utility function - Remove exposed parameter from all Column type constructors - Remove data_ptr_exposed parameter from ColumnBase.from_pylibcudf() - Remove data_ptr_exposed parameter from ColumnBase.from_cuda_array_interface() - Remove data_ptr_exposed=True from high-level API call sites Buffers are now only marked as exposed through implicit detection when: - Raw ptr property is accessed outside an access context - __cuda_array_interface__["data"][0] is accessed - scope="external" is used in an access context This ensures correct spillable buffer behavior and copy-on-write semantics. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
…_args Replaced individual __init__ overrides in column subclasses with a unified _validate_args classmethod pattern. This eliminates code duplication and centralizes validation logic. Changes: - Added ColumnBase._validate_args() classmethod for plc_column/dtype validation - Removed __init__ overrides from NumericalColumn, DatetimeColumn, TimeDeltaColumn, DecimalBaseColumn, ListColumn, StructColumn, and IntervalColumn - StringColumn retains minimal __init__ for instance attribute initialization - Removed deprecated _validate_dtype_instance methods - Each column type now overrides _validate_args() for type-specific validation Benefits: - Reduces code duplication across 8+ column types - Centralizes validation logic in one place - Maintains all existing validation behavior - Easier to maintain and extend 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Replaced manual caching of _start_offset and _end_offset with @functools.cached_property, eliminating the need for a custom __init__ method in StringColumn. Changes: - Converted start_offset and end_offset from manual @Property with conditional caching to @cached_property - Removed __init__ override that initialized _start_offset and _end_offset - Removed class-level type annotations for _start_offset and _end_offset - Simplified offset computation logic without manual cache checks Benefits: - Eliminates the last __init__ override in column subclasses - Reduces boilerplate code for caching - More Pythonic use of standard library features - Automatic caching behavior without manual state management 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
- Wrap transpose() with access_columns() to prevent buffers from being incorrectly marked as exposed during pylibcudf calls - Update __cuda_array_interface__ to use scope="external" to properly mark buffers as exposed when external code accesses the pointer - Update test_df_transpose to expect buffers NOT to be exposed after normal DataFrame operations (only when explicitly accessed) - Update test_series_zero_copy_cow_on expectations: shallow copies share memory with source, so both change when external array changes - Remove exposed parameter from as_buffer() call in test_get_ptr 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
bdice
approved these changes
Jan 8, 2026
mroeschke
approved these changes
Jan 8, 2026
Contributor
|
/merge |
This was referenced Apr 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR removes the ability to construct columns and buffers that are already exposed, which is not actually ever possible in the current cudf data model. This change allows us to simplify the various column constructors and standardize the validation process.
Relatedly, this PR ensures that conversion from cudf ColumnBase to pylibcudf Column unwraps Buffers so that you do not expose the pylibcudf representation to cudf's Buffer semantics. That change should allow us to fully decouple the internal representation of pylibcudf Columns inside cudf from how they are exposed to public APIs, which also ensures that we do not break CoW and spilling functionality by making too many Buffer copies that we shouldn't.
Checklist